{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "tags": [
     "remove-cell"
    ]
   },
   "outputs": [],
   "source": [
    "import sys\n",
    "import os\n",
    "if not any(path.endswith('textbook') for path in sys.path):\n",
    "    sys.path.append(os.path.abspath('../../..'))\n",
    "from textbook_utils import *"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "C9asOtmc0K6v"
   },
   "source": [
    "# HTTP\n",
    "\n",
    "HTTP (aka HyperText Transfer Protocol) is an all-purpose infrastructure to access resources on the web. There are a tremendous number of datasets available to us on the internet, and with HTTP we can acquire these datasets."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "PfSmQf_a0K6w"
   },
   "source": [
    "The internet allows computers to communicate with each other, and HTTP places a structure on the communication. HTTP is a simple *request-response* protocol, where a client submits a *request* to a server in a specially formatted text message, and the server sends a specially formatted text  *response* back. The client might be a web browser or our Python session."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "An HTTP request has two parts: a header and an optional body. The header must follow a specific syntax. An example request to obtain the Wikipedia page shown in {numref}`Figure %s <fig-wiki-1500>` looks like the following\n",
    "\n",
    "```\n",
    "GET /wiki/1500_metres_world_record_progression HTTP/1.1\n",
    "Host: en.wikipedia.org\n",
    "User-Agent: curl/7.65.2\n",
    "Accept: */* \n",
    "{blank_line}\n",
    "``` "
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first line contains three pieces of information: it starts with the method of the request, which is GET; this is followed by the URL of the web page we want; and last is the protocol and version. Each of the three lines that follow give auxiliary information for the server. This information has the format `name: value`. Finally, a blank line marks the end of the header. Note that we've marked the blank line with `{blank_line}` in the preceding snippet; in the actual message, this is actually a blank line."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{figure} figures/Wikipedia1500mScreen23-02-24.png\n",
    "---\n",
    "name: fig-wiki-1500\n",
    "---\n",
    "\n",
    "Screenshot of the [Wikipedia page](https://en.wikipedia.org/wiki/1500_metres_world_record_progression) with data on the world record for the 1,500-meter race\n",
    "```"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The client's computer sends this message over the internet to the Wikipedia server. The server processes the request and sends a response, which also consists of a header and body. The header for the response looks like this:\n",
    "\n",
    "```\n",
    "< HTTP/1.1 200 OK\n",
    "< date: Fri, 24 Feb 2023 00:11:49 GMT\n",
    "< server: mw1369.eqiad.wmnet\n",
    "< x-content-type-options: nosniff\n",
    "< content-language: en\n",
    "< vary: Accept-Encoding,Cookie,Authorization\n",
    "< last-modified: Tue, 21 Feb 2023 15:00:46 GMT\n",
    "< content-type: text/html; charset=UTF-8\n",
    "...\n",
    "< content-length: 153912\n",
    "{blank_line}\n",
    "```\n",
    "\n",
    "The first line states that the request completed successfully; the status code is 200. The next lines give additional information for the client. We shortened this header quite a bit to focus on just a few pieces of information that tell us the content of the body is HTML and uses UTF-8 encoding, and the content is 153,912 characters long. Finally, the blank line at the end of the header tells the client that the server has finished sending header information, and the response body follows."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "HTTP is used in almost every application that interacts with the internet. For example, if you visit this same Wikipedia page in your web browser, the browser makes the same basic HTTP request as the one just shown. When it receives the response, it displays the body in your browser's window, which looks like the screenshot in {numref}`Figure %s <fig-wiki-1500>`. \n",
    "\n",
    "In practice, we do not write out full HTTP requests ourselves. Instead, we use tools like the `requests` Python library to construct requests for us. The following code constructs the HTTP request for the page in {numref}`Figure %s <fig-wiki-1500>` for us. We simply pass  the URL to `requests.get`. The \"get\" in the name indicates the GET method is being used:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 346,
     "status": "ok",
     "timestamp": 1523471119234,
     "user": {
      "displayName": "TIFFANY THERESA JANN",
      "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
      "userId": "107371321413131637774"
     },
     "user_tz": 420
    },
    "id": "bWtSMNiJ8PUu",
    "outputId": "ad66ba9e-17d8-4a48-e09e-7434f9995974"
   },
   "outputs": [],
   "source": [
    "import requests\n",
    "\n",
    "url_1500 = 'https://en.wikipedia.org/wiki/1500_metres_world_record_progression'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "resp_1500 = requests.get(url_1500)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can check the status of our request to make sure the server completed it successfully:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 34
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 306,
     "status": "ok",
     "timestamp": 1523471565520,
     "user": {
      "displayName": "TIFFANY THERESA JANN",
      "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
      "userId": "107371321413131637774"
     },
     "user_tz": 420
    },
    "id": "vTFq3Tdm0K7H",
    "outputId": "ebd84dfe-04ff-4da2-e452-ce227b2f0dbc"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "200"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resp_1500.status_code"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can thoroughly examine the request and response through the object's attributes. As an example, let's take a look at the key-value pairs in the header in our request:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "cellView": "code",
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     },
     "base_uri": "https://localhost:8080/",
     "height": 86
    },
    "colab_type": "code",
    "executionInfo": {
     "elapsed": 340,
     "status": "ok",
     "timestamp": 1523471831894,
     "user": {
      "displayName": "TIFFANY THERESA JANN",
      "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128",
      "userId": "107371321413131637774"
     },
     "user_tz": 420
    },
    "id": "H7uZ1npO0K67",
    "outputId": "6380dea2-0764-4841-bc1c-06acd2539b4b"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "User-Agent: python-requests/2.25.1\n",
      "Accept-Encoding: gzip, deflate\n",
      "Accept: */*\n",
      "Connection: keep-alive\n"
     ]
    }
   ],
   "source": [
    "for key in resp_1500.request.headers:\n",
    "    print(f'{key}: {resp_1500.request.headers[key]}')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Although we did not specify any header information in our function call, `request.get` provided some basic information for us. If we need to send special header information, we can specify them in our call."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "lxI7P4jm0K7A"
   },
   "source": [
    "Now let's examine the header of the response we received from the server:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "20"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "len(resp_1500.headers)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As we saw earlier, there's a lot of header information in the response. We just display the `date`, `content-type`, and `content-length`:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "id": "38aHtUBe0K7B",
    "outputId": "28e49080-9321-41ae-acfc-8537ca7fa43a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "date: Fri, 10 Mar 2023 01:54:13 GMT\n",
      "content-type: text/html; charset=UTF-8\n",
      "content-length: 23064\n"
     ]
    }
   ],
   "source": [
    "keys = ['date', 'content-type', 'content-length' ]\n",
    "for key in keys:\n",
    "    print(f'{key}: {resp_1500.headers[key]}')"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we display the first several hundred characters of the response body (the entire content is too long to display nicely here):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'<!DOCTYPE html>\\n<html class=\"client-nojs vector-feature-language-in-header-enabled vector-feature-language-in-main-page-header-disabled vector-feature-language-alert-in-sidebar-enabled vector-feature-sticky-header-disabled vector-feature-page-tools-disabled vector-feature-page-tools-pinned-disabled vector-feature-toc-pinned-enabled vector-feature-main-menu-pinned-disabled vector-feature-limited-width-enabled vector-feature-limited-width-content-enabled\" lang=\"en\" dir=\"ltr\">\\n<head>\\n<meta charset=\"UTF-8\"/>\\n<title>1500 metres world record progression - Wikipedia</title>\\n<script>document.documentE'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resp_1500.text[:600]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We confirm that the response is an HTML document and that it contains the title `1500 metres world record progression - Wikipedia`. We have successfully retrieved the web page shown in {numref}`Figure %s <fig-wiki-1500>`."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "0nkJHMtJ0K7b"
   },
   "source": [
    "Our HTTP request has been successful, and the server has returned a status code of `200`. There are hundreds of other HTTP status codes. Thankfully, they are grouped into categories to make them easier to remember (see {numref}`Table %s <response-codes>`)."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ":::{table} Response status codes\n",
    ":name: response-codes\n",
    "\n",
    "| Code   | Type    |  Description                                                                  |\n",
    "|--------|---------|-------------------------------------------------------------------------------|\n",
    "| 100s   | Informational | More input is expected from the client or server (100 Continue, 102 Processing, etc.). |\n",
    "| 200s   | Success  | The client's request was successful (200 OK, 202 Accepted, etc.). |\n",
    "| 300s   | The redirection | Requested URL is located elsewhere and may need user's further action from the user (300 Multiple Choices, 301 Moved Permanently, etc.).  |\n",
    "| 400s   | Client error |  A client-side error  occurred(400 Bad Request, 403 Forbidden, 404 Not Found, etc.).         |\n",
    "| 500s   | Server error |  A server-side error  occurred or the server is incapable of performing the request (500 Internal Server Error, 503 Service Unavailable, etc.). |\n",
    "\n",
    ":::"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "One common error code that might look familiar is 404, which tells us we have requested a resource that doesn't exist. We send such a request here: "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "colab": {
     "autoexec": {
      "startup": false,
      "wait_interval": 0
     }
    },
    "colab_type": "code",
    "id": "5SANbo8w0K7c",
    "outputId": "a0bc9752-f9c0-44c9-8a14-758e38ca02d1"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "404"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "url = \"https://www.youtube.com/404errorwow\"\n",
    "bad_loc = requests.get(url)\n",
    "bad_loc.status_code"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jWRTnKQ90K7V"
   },
   "source": [
    "The request we made to retrieve the web page was a `GET` HTTP request. There are four main HTTP request types: GET, POST, PUT, and DELETE. The two most commonly used methods are `GET` and `POST`. We just used GET to retrieve the web page:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'GET'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "resp_1500.request.method"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "jWRTnKQ90K7V"
   },
   "source": [
    "The `POST` request is used to send specific information from the client to the server. In the next section, we use `POST` to retrieve data from Spotify."
   ]
  }
 ],
 "metadata": {
  "colab": {
   "collapsed_sections": [],
   "default_view": {},
   "name": "ajkim@berkeley.edu_task1.ipynb",
   "provenance": [],
   "version": "0.3.2",
   "views": {}
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.4"
  },
  "toc": {
   "nav_menu": {},
   "number_sections": false,
   "sideBar": false,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": true,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
